negative sample
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Europe > Germany (0.04)
- (5 more...)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
0be50b4590f1c5fdf4c8feddd63c4f67-Supplemental-Datasets_and_Benchmarks.pdf
In Figure 1 we demonstrate the common neighbor (CN) distribution among positive and negative test samples for ogbl-collab, ogbl-ppa, and ogbl-citation2. These results demonstrate that a vast majority of negative samples have no CNs. Since CNs is a typically good heuristic, this makes it easy to identify most negative samples. We further present the CN distribution of Cora, Citeseer, Pubmed, and ogbl-ddi in Figure 3. The CN distribution of Cora, Citeseer, and Pubmed are consistent with our previous observations on the OGB datasets in Figure 1. We note that ogbl-ddi exhibits a different distribution with other datasets. As compared to the other datasets, most of the negative samples in ogbl-ddi have common neighbors. This is likely because ogbl-ddi is considerably denser than the other graphs.
Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering
Since implicit feedback data contain positive instances only, the implicit CF problem is also related to learning from positive-unlabeled (PU) data. The first three authors have equal contributions. However, the size of unlabeled data in implicit CF can approach to nearly a product of user count and item count, making above non-sampling approach become unaffordable in terms of learning efficiency. Negative sampling approaches have also been widely adopted in other domains of embedding learning for text, graph, etc. Motivated by these works that tend to leverage a simple model for Our SRNS's hyper-parameters can be divided into three parts: (1) sampling related part, including In synthetic noise experiments, since we do not explicitly split a validation set on synthetic data, we draw two different train/test splits. In real data experiments, we conduct the standard procedure to split train/validation/test set.
Semi-supervised Multi-label Learning with Balanced Binary Angular Margin Loss
Semi-supervised multi-label learning (SSMLL) refers to inducing classifiers using a small number of samples with multiple labels and many unlabeled samples. The prevalent solution of SSMLL involves forming pseudo-labels for unlabeled samples and inducing classifiers using both labeled and pseudo-labeled samples in a self-training manner. Unfortunately, with the commonly used binary type of loss and negative sampling, we have empirically found that learning with labeled and pseudo-labeled samples can result in the variance bias problem between the feature distributions of positive and negative samples for each label. To alleviate this problem, we aim to balance the variance bias between positive and negative samples from the perspective of the feature angle distribution for each label. Specifically, we extend the traditional binary angular margin loss to a balanced extension with feature angle distribution transformations under the Gaussian assumption, where the distributions are iteratively updated during classifier training. We also suggest an efficient prototype-based negative sampling method to maintain high-quality negative samples for each label. With this insight, we propose a novel SSMLL method, namely Semi-Supervised Multi-Label Learning with Balanced Binary Angular Margin loss (S$^2$ML$^2$-BBAM). To evaluate the effectiveness of S$^2$ML$^2$-BBAM, we compare it with existing competitors on benchmark datasets.
Energy-Based Models for Anomaly Detection: A Manifold Diffusion Recovery Approach
We present a new method of training energy-based models (EBMs) for anomaly detection that leverages low-dimensional structures within data. The proposed algorithm, Manifold Projection-Diffusion Recovery (MPDR), first perturbs a data point along a low-dimensional manifold that approximates the training dataset. Then, EBM is trained to maximize the probability of recovering the original data. The training involves the generation of negative samples via MCMC, as in conventional EBM training, but from a different distribution concentrated near the manifold. The resulting near-manifold negative samples are highly informative, reflecting relevant modes of variation in data. An energy function of MPDR effectively learns accurate boundaries of the training data distribution and excels at detecting out-of-distribution samples. Experimental results show that MPDR exhibits strong performance across various anomaly detection tasks involving diverse data types, such as images, vectors, and acoustic signals.
Test-Time Distribution Normalization for Contrastively Learned Visual-language Models
Advances in the field of visual-language contrastive learning have made it possible for many downstream applications to be carried out efficiently and accurately by simply taking the dot product between image and text representations. One of the most representative approaches proposed recently known as CLIP has quickly garnered widespread adoption due to its effectiveness. CLIP is trained with an InfoNCE loss that takes into account both positive and negative samples to help learn a much more robust representation space. This paper however reveals that the common downstream practice of taking a dot product is only a zeroth-order approximation of the optimization goal, resulting in a loss of information during test-time. Intuitively, since the model has been optimized based on the InfoNCE loss, test-time procedures should ideally also be in alignment. The question lies in how one can retrieve any semblance of negative samples information during inference in a computationally efficient way. We propose Distribution Normalization (DN), where we approximate the mean representation of a batch of test samples and use such a mean to represent what would be analogous to negative samples in the InfoNCE loss. DN requires no retraining or fine-tuning and can be effortlessly applied during inference. Extensive experiments on a wide variety of downstream tasks exhibit a clear advantage of DN over the dot product on top of other existing test-time augmentation methods.
Why do We Need Large Batchsizes in Contrastive Learning? A Gradient-Bias Perspective
Contrastive learning (CL) has been the de facto technique for self-supervised representation learning (SSL), with impressive empirical success such as multi-modal representation learning. However, traditional CL loss only considers negative samples from a minibatch, which could cause biased gradients due to the non-decomposibility of the loss. For the first time, we consider optimizing a more generalized contrastive loss, where each data sample is associated with an infinite number of negative samples. We show that directly using minibatch stochastic optimization could lead to gradient bias. To remedy this, we propose an efficient Bayesian data augmentation technique to augment the contrastive loss into a decomposable one, where standard stochastic optimization can be directly applied without gradient bias. Specifically, our augmented loss defines a joint distribution over the model parameters and the augmented parameters, which can be conveniently optimized by a proposed stochastic expectation-maximization algorithm.